Decorators¶
Intro to Decorators¶
Decorators are a feature of python that allows extra functionality to be added to objects. A simple example of this is a decorator which formats the output for you
[1]:
# actual decorator name, takes a function as an argument
def format_to_string(func):
# internal decorator parser, this does the "work"
def formatter(*args, **kwargs):
# create this as a simple parser, returning the result in a string
return f"The result is: '{func(*args, **kwargs)}'"
return formatter
# the @ syntax calls our function as a decorator
@format_to_string
def multiply(x, y):
return x * y
# lets see what it does to the output
multiply(4, 7)
[1]:
"The result is: '28'"
This example shows a basic use of a decorator. Python has internal decorators, a good example is the @classmethod
which is used to modify a class method to return an instance of that class
.
Likewise, remotemanager
also provides some decorators for usage:
RemoteFunction
allows users to “tag” extra functions to be brought along with a main DatasetSanzuFunction
operates similar to sanzu, but can be used directly within scripts
RemoteFunction¶
It is a fundamental idea of programming to convert repetitive sections of code into functions, turning whole blocks into single lines.
Lets imagine a two stage workflow for processing a number:
input is processed according to the rules: if num > 100 -> x0.5, else x2
format the output into a string format for easy reading
Note
This is an extremely basic example, designed to show the limitations of a base Dataset
.
[2]:
num = 200
if num > 100:
num = num / 2
else:
num = num * 2
num = float(num)
print('The result is', num)
The result is 100.0
Using Functions¶
While this works, it does not scale well to large sets of inputs. A standard way of increasing the scalability is to define functions which can be called on any inputs:
[3]:
# define the formatter
def formatted(temp):
return f'The result is {float(temp)}'
# and now the processing function
def process(number):
if number > 100:
return formatted(number / 2)
return formatted(number * 2)
print(process(74))
The result is 148.0
Single Function Limits¶
It is here where we run into an issue with remotemanager as we can only define a single function for the Dataset to hold.
There are two paths we can take here using native python:
Refactor the workflow to be contained within a single function
Use an inner function
Obviously refactoring here is trivial to do, however that approach can get cumbersome very quickly with even small increases in complexity. Lets start with option 2, inner functions:
[4]:
def process(number):
"""
This function halves any number above 100,
doubling otherwise.
"""
def formatted(temp):
return f'The result is {float(temp)}!'
if number > 100:
return formatted(number / 2)
return formatted(number * 2)
print(process(200))
print(process(42))
The result is 100.0!
The result is 84.0!
The Third Option¶
Obviously this tutorial wouldn’t exist if there wasn’t some way around this limitation so remotemanager
takes this a step further and gives you a third option: Allowing you to mark extra functions for sending. These functions are added in addition to the one placed within Dataset
. For this we use the RemoteFunction
decorator.
This is useful in a situation where you have multiple datasets holding different functions, but want a single formatting function for all jobs, for example.
For this workflow, we would be adding process
to the Dataset
, which means we should also indicate that we need formatted
in this workflow:
[5]:
from remotemanager import Dataset, URL, RemoteFunction
@RemoteFunction
def formatted(temp):
return f'The result is {float(temp)}!'
def process(number):
if number > 100:
return formatted(number / 2)
return formatted(number * 2)
print(process(200))
print(process(42))
The result is 100.0!
The result is 84.0!
[6]:
url = URL()
ds = Dataset(process,
url = url,
skip = False)
ds.append_run({'number': 200})
ds.append_run({'number': 42})
appended run runner-0
appended run runner-1
[7]:
ds.run()
ds.wait(2)
ds.fetch_results()
print(ds.results)
Staging Dataset... Staged 2/2 Runners
Transferring for 2/2 Runners
Transferring 7 Files... Done
Remotely executing 2/2 Runners
Fetching results
Transferring 4 Files... Done
['The result is 100.0!', 'The result is 84.0!']
Expandability¶
You are not limited to a single extra function, so go wild! All stored functions are also available to all Datasets within the notebook in which they are defined, so you can further reduce boilerplate code in complex workflows where function sharing would be beneficial.
Warning
The functions that are cached are not stored within the databases themselves, so must always be defined within your notebook.
SanzuFunction¶
Added in version 0.10.9.
Similar to sanzu, however instead of executing a cell, you can designate a function to be remote callable.
Note
You can pass any run_args
you may need directly to the decorator, and they will be attributed to the Dataset
that is created.
[8]:
from remotemanager import URL, SanzuFunction
url = URL("localhost")
@SanzuFunction(url=url)
def execute_remotely(x, y):
return x * y
print(execute_remotely(x=10, y=9))
appended run runner-0
Staging Dataset... Staged 1/1 Runners
Transferring for 1/1 Runners
Transferring 5 Files... Done
Remotely executing 1/1 Runners
Fetching results
Transferring 2 Files... Done
90
Choosing a Remote¶
Passing a URL
at the decorator level will “bake in” that url to the function, causing it to always be called on that remote. Without passing a url, however, the Dataset will use its default_url
property if set.
Note
To set the default_url
property you can import Dataset
then update with Dataset.default_url = URL(...)
Added in version 0.10.10: Now SanzuFunctions can be called with non keyword args.
[9]:
print(execute_remotely(7, 3))
appended run runner-1
Staging Dataset... Staged 1/2 Runners
Transferring for 1/2 Runners
Transferring 5 Files... Done
Remotely executing 1/2 Runners
Fetching results
Transferring 2 Files... Done
21